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§ ! 1 Introduction. 

(n: 

-. , Our starting point is the following well known theorem from probability: Let Xi, . . . ,X, 

'gj ■ be (stochastically) independent random variables with finite second moments, and let Sn ~ 

[in '■ Er=i X^. Then 

(N ; Var(^„) = ^Var(X,). (1) 

i=l 

r^ . If we suppose that each X; has mean zero, IE Xi = 0, then ([T]l becomes 

-S: jEsi^Y^^xi (2) 

Ch i This equaUty generalizes easily to vectors in a Hilbert space H with inner product (•, •): 

If the Xi's are independent with values in H such that lEXi = Q and IE 1 1 X; | p < oo, then 

CN ■ ll^'nlP = (Sn.Sn) = X]ri=i (-^^^ ' ^j")' ™d sincc ]E(Xi, Xj ) = for z 7^ j by independence, 

>• 

lO ■ n n 

^: ]E||^„||2^ ^1E(X„X,)=^]E||X,||2. (3) 

(N ■ 

1^ I What happens if the X/s take values in a (real) Banach space (B, j| • ||)? In such cases, in 

^— N i particular when the square of the norm 1 1 • 1 1 is not given by an inner product, we are aiming at 

QQ ' inequalities of the following type: Let Xi, X2, . . . , X„ be independent random vectors with 

O ■ values in (1, j] • |1) with EX, = and IE \\Xif < 00. With S„ := J27=i ^i we want to 

^ I show that 

"O : ]E||5„||2 < A'^]E||X,||2 (4) 

for some constant K depending only on (B, || • j| ). 

For statistical applications, the case (B, || • ||) = £f := (M'', || • \\r) for some r e [1, 00] is 
of particular interest. Here the r-norm of a vector a; e M^ is defined as 



max Ix, I if r = 00. 

.l<j<d 

An obvious question is how the exponent r and the dimension d enter an inequality of type (01). 
The influence of the dimension d is crucial, since current statistical research often involves 



small or moderate "sample size" n (the number of independent units), say on the order of 10^ 
or 10"*, while the number d of items measured for each independent unit is large, say on the 
order of 10^ or 10^. The following two examples for the random vectors Xi provide lower 
bounds for the constant A' in ©I 

Example 1.1. (A lower bound in if) Let bi,b2, ■ ■ ■ ,bd denote the standard basis ofW^, 
and /ef ei , 62 , . . . , Ed be independent Rademacher variables, i.e. random variables taking the 
vahies +1 and —1 each with probability 1/2. Define Xi := Eibifor 1 < i < n := d. Then 

TEXi = 0, ||X,|j2 = 1, and\\Sn\\r- = d^^"^ = ^^^''"^ Er=i ll^^llr- Thus any candidate for 
K in has to satisfy K > d^/'-i. 

Example 1.2. (A lower bound in i"^) Let Xi, X2, X^, . . . be independent random vectors, 
uniformly distributed on { — 1, 1}'' each. Then lEX^ = and \\Xi\\rx! ~ 1- On the other 
hand, according to the Central Limit Theorem, n~^''^Sn converges in distribution as n ^ 00 



N{0, 1). Hence 



to a random vector Z = {Zj)'~^^ with independent, standard Gaussian components, Zj 



IF II S" 11^ 

But it is well-known that maxi<j<c; \Zj\ = \J2\o'gd + Op(l) as d ^ 00. Thus candidates 
Kid) for the constant in (|3 have to satisfy 

liminf^i^ > 1. 
d^oo 2 log a 

At least three different methods have been developed to prove inequalities of the form 
given by (|4|i. The three approaches known to us are: 

(a) deterministic inequalities for norms; 

(b) probabilistic methods for Banach spaces; 

(c) empirical process methods. 

Approach (a) was used by Nemirovski lfT4ll to show that in the space Cf with d > 2, inequal- 
ity (IDi holds with K ^ C min(r, \og{d)) for some universal (but unspecified) constant C. In 
view of Example II. 21 this constant has the correct order of magnitude if r = 00. For statis- 
tical applications see Greenshtein and Ritov [7j. Approach (b) uses special moment inequal- 
ities from probability theory on Banach spaces which involve nonrandom vectors in B and 
Rademacher variables as introduced in Example ll.il Empirical process theory (approach (c)) 
in general deals with sums of independent random elements in infinite-dimensional Banach 
spaces. By means of chaining arguments, metric entropies and approximation arguments, 
"maximal inequalities" for such random sums are built from basic inequalities for sums of 
independent random variables or finite-dime nsional random vectors, in particular, "exponen- 
tial inequalities"; see e.g. Dudley ||4l, van der Vaart and Wellner 1261 , Pollard II2TII . de la Pena 
and Gine ||3|, or van de Geer ||25| . 

Our main goal in this paper is to compare the inequalities resulting from these different 
approaches and to refine or improve the constants K obtainable by each method. The re- 
mainder of this paper is organized as follows: In Section |2] we review several deterministic 
inequalities for norms and, in particular, key arguments of Nemirovski lfT4l . Our exposition 
includes explicit and improved constants. While finishing the present paper we became aware 
of yet unpublished work of 11511 and |12J who also improved some inequalities of lfT4l . Rio 
uses similar methods in a different context. In Section [3] we present inequalities of type 



(IDi which follow from type and co-type inequalities developed in probability theory on Ba- 
nach spaces. In addition, we provide and utilize a new type inequality for the normed space 
£J^. To do so we utilize, among other tools, exponential inequalities of Hoeffding |[9l and 
Pinelis ifTTll . In Section|4]we follow approach (c) and treat £J^ by means of a truncation argu- 
ment and Bernstein's exponential inequality. Finally, in Section|5]we compare the inequalities 
resulting from these three approaches. In that section we relax the assumption that TEXi ~ 
for a more thorough understanding of the differences between the three approaches. Most 
proofs are deferred to Section|6] 

2 Nemirovski's approach: Deterministic inequalities for 
norms. 

In this section we review and refine inequalities of type (01 based on deterministic inequalities 
for norms. The considerations for (B, || • || ) = £f follow closely the arguments of lfT4l . 

2.1 Some inequalities for R'' and the norms || \\r 

Throughout this subsection let B = W^, equipped with one of the norms j| • \\r defined in (|5]). 
For X G W' we think of a; as a column vector and write x'^ for the corresponding row vector 
Thus xx^ is ad X d matrix with entries XiXj for i,j e {1, . . . , d}. 

A first solution. Recall that for any x <E M'', 

\\x\\r < \\x\\q < d^/'^-'^^''\\x\\r fOTl<q<r<00. (6) 

Moreover, as mentioned before. 



2 
'-i|l2- 



IE||5„||2 = 5^]Ej|X. 
i=i 

Thus for 1 < g < 2, 
whereas for 2 < r < oo, 

n 

^\\SX < IE||5„l|^ = J2^\\X,\\l < d'-'/^Y.^\\X., 

i=l i=l 

Thus we may conclude that (|4]l holds with 

~, , fd^A-i ifl<r<2, 

K = K{d,r) := \ " " 

\d^ if 2 < r < oo. 

Example l 1 . 1 1 shows that this constant K{d, r) is indeed optimal for 1 < r < 2. 
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A refinement for r > 2. In what follows we shall replace K{d, r) ~ d^ ^Z*" with substan- 
tially smaller constants. The main ingredient is the following result; 

Lemma 2.1. For arbitrary fixed r G [2, oo) and a; e M'' \ {0} let 

h{x) := 2\\x\\l-^{\x,r^x,)1^^ 

while h{0) := 0. Then for arbitrary x,y G R'^, 

\\x\\l + h{xVy <\\x + y\\l< \\x\\l + hix)'^y+{r-l)\\y\\l 

lfT6l and lfT4l stated Lemma 12.11 with the factor r — 1 on the right side replaced with 
Cr for some (absolute) constant C > 1. Lemma ITTl which is a special case of the more 
general Lemma |Z41 in the next subsection, may be applied to the partial sums Sq :— and 



Sk '■= J2i=i ^i' 1 < fc < "■, to show that for 2 < r < oo, 

JE\\Sk\\j < lE{\\Sk-irr + h{Sk^,)'^Xk + {r-l)\\Xk\\f) 

= ]E\\Sk-i\\l +^h{Sk-iV TEXk + {r - l)lE\\Xk\\l 
= ]E\\Sk-i\\l + {r-l)^\\Xk\\l 

and inductively we obtain a second candidate for iiT in (|4|: 

n 

E||5„||2 < (r-l)^IE||X,||2 for2<r<oo. 
Finally, we apply (|6]l again; For 2 < q < r < oo with q < oo, 

n n 



2 

illr- 



This inequality entails our first (q = 2) and second {q — r < oo) preliminary result, and we 
arrive at the following refinement: 

Tlieorem 2.2. For arbitrary r G [2, oo], 

n 
^\\Sn\\l < /VNcm(d,r)^lE||X,||2 
) = 1 

with 

K^,Ud,r) := inf (g - 1)^2/9-2/^ 

qG[2,r]nR 

This constant K^cm{d, r) satisfies the (in)equalities 

( = di-2A ifd < 7 
KNcm{d,r) < < r - 1 

[ < 2elogrf-e ifd> 3, 

and 

^Ncm(rf, oo) > 2elogd — 3e. 

Corollary 2.3. In case o/(B, || • ||) ~ £'^ with d > 3, inequality d?]) holds with constant 
K = 2e log d — e. If the Xi 's are also identically distributed, then 

lE\\n-'/'SJl < (2elogd-e)]E||Xi||^. 



Note that 

,. -K'Nem(d, oo) 2elogd-e 

lim — ; = lim — — ; — = e. 

d->oo 2 log a d-too 2 log a 

Thus Example l 1 .2l entails that for large dimension d, the constants i^Ncm (rf, oo) and 2e log d- 
e are optimal up to a factor close to e ^ 2.7183. 

2.2 Arbitrary Lf-spaces 



Lemma lZTj is a special case of a more general inequality: Let (T, S, fi) be a cr-finite measure 
space, and for 1 < r < oo let Lr (m) be the set of all measurable functions f : T ^ M. with 
finite (semi-) norm 

11/11. := {|\f^d^^y\ 

where two such functions are viewed as equivalent if they coincide almost everywhere with 
respect to fj,. In what follows we investigate the functional 

/ ^ Vif) := WfWl 

on Lrii-i). Note that {W^, \\ ■ \\r) corresponds to (ir(^), || • ||,.) if we take T = {1, 2, . . . , d} 
equipped with counting measure fi. 

Note that V{-) is convex; thus for fixed /, g G Lr{n), the function 

v{t) := V(f + tg) = \\f + tg\\l, teM 

is convex with derivative 

v'{t) = v^~^/\t) J 21/ + tgr^f + tg)gd^i. 

By convexity of v it follows that 

V{.f + g) - V{f) = v{l) - v{0) > v'{0) := DVif, g). 

This proves the lower bound in the following lemma. We will prove the upper bound in 
Section|6]by computation of v" and application of Holder's inequality. 

Lemma 2.4. Let r > 2. Then for arbitrary f,gG Lr{iJ,), 

DV{f,g) = Jhif)gdfi with h{f) := 2\\f\\l--\fr^f e L,(m), 

where q :— r/{r — 1). Moreover, 

V{f) + DVif,g) < Vif + g) < ¥{/) + DV{f,g) + {r - l)Vig). 

Remark 2.5. The upper bound for V{f + g) is sharp in the following sense: Suppose that 
fJ.(T) < oo, and let /, go : T — > R be measurable such that \f\ = \go\ = 1 and J fgo dfi = 0. 
Then our proof of Lemma \2.4\ reveals that 

V{f + tgo)-V{f)-DV{f,tgo) 

777 : -^ r — 1 as t ^ u. 

V{tgo) 



Remark 2.6. In case ofr = 2, Lemma \2l4\ is well known and easily verified. Here the upper 
bound for V{f + g) is even an equality, i.e. 

V{f + g) = V{f) + DV{f,g) + V{g). 

Remark 2.7. Lemma \2.4\ improves on an inequality of l[16]I . After writing this paper we 
realized Lemma |Z41 is also proved by HIS'H; see his (2.2) and Proposition 2.1, page 1680. 



Lemma lZ4l leads directly to the following result: 
Corollary 2.8. In case o/B = Lr{p) for r > 2, inequality d?]) is satisfied with K = r ~ 1. 

2.3 A connection to geometrical functional analysis 

For any Banach space (B, j| • j|) and Hilbert space (H, (•,•), || -Ij), their Banach-Mazur distance 
^(B, H) is defined to be the infimum of 

||T|| • IIT^^II 

over all linear isomorphisms T : M ^ M, where ||r|| and ||T^^|| denote the usual operator 
norms 

||r|| := sup{||Ta-|| : a: eB, ||a;|| < 1}, 
\\T-'\\ := sup{||T-iy||:yeH,||2y||<l}. 

(If no such bijection exists, one defines D(B, H) := oo.) Given such a bijection T, 

TE\\Snf < \\T-'fTE\\TSnf 

n 

n 

This leads to the following observation: 

Corollary 2.9. For any Banach space (B, || ■ ||) and any Hilbert space (H, (, •,•,), || • ||) with 
finite Banach-Mazur distance Z?(B, H), inequality (01 is satisfied with K = Z?(B, H)^. 

A famous result from geometrical functional analysis is John's theorem (cf. 1241 . ifTTl ) 

for finite-dimensional normed spaces. It entails that Z)(B, ^2"" ) ^ ■\/dim(B). This entails 
the following fact: 

Corollary 2.10. For any normed space (B, || • ||) with finite dimension, inequality is 
satisfied with K = dim(B). 

Note that Example II. II with r = 1 provides an example where the constant K = dim(B) 
is optimal. 



3 The probabilistic approach: Type and co-type 
inequalities. 

3.1 Rademacher type and cotype inequalities 

Let {e;} denote a sequence of independent Rademacher random variables. Let 1 < p < oo. 
A Banach space B with norm 1 1 • 1 1 is said to be of (Rademacher) type p if there is a constant 
Tp such that for all finite sequences {xi} in B, 

n n 

]e||^6,x,|[ < r;^|ix,r. 

i=l 1=1 

Similarly, for 1 < g < oo, B is of (Rademacher) cotype q if there is a constant Cg such that 
for all finite sequences {xi} in B, 

i=l \i=l / 



Ledoux and Talagrand ||T3|, page 247, note that type and cotype properties appear as dual 
notions: If a Banach space B is of type p, its dual space B' is of cotype q ~ p/{p — I). 

One of the basic results concerning Banach spaces with type p and cotype q is the follow- 
ing proposition: 

Proposition 3.1. 03] Proposition 9.11, page 248]. 
IfM is of type p > 1 with constant Tp, then 

n 

IE||^„r < (2Tp)P^lE||X,r. 
IfM is of cotype q > 1 with constant Cq, then 

n 

IE||^„r > (2C,)-«^IE||X,||'^. 

4=1 

As shown in ifTSJI . page 27, the Banach space Lr{iJ,) with 1 < r < oo (cf. section |23| | 
is of type min(r, 2). Similarly, Lr{iJ.) is co-type max(r, 2). In case of r > 2 = p, explicit 
values for the constant Tp in Proposition 13 . 1 1 can be obtained from the optimal constants in 
Khintchine's inequalities due to ISl . 

Lemma 3.2. For 2 < r < oo, the space Lr{n) is of type 2 with constant T2 = Br, where 



TT 

2 



Corollary 3.3. For B = Lr{n), 2 < r < 00, inequality (0) is satisfied with K — AB^. 
Note that B2 = I and 



Br 1 

—= —> —= as r — > 00. 
\ r Je 



Thus for large values r, the conclusion of Corollary 13.31 is weaker than the one of Corol- 
lary 123] 



3.2 The space i^^ 

The preceding results apply only to r < cx3, so the special space ^J^ requires different argu- 
ments. At first we deduce a new type inequality based on Hoeff ding's {gl exponential inequal- 
ity: If ei, £2, . . . , e„ are independent Rademacher random variables, ai, a2, . . . , a„ are real 
numbers and v^ :— X^ILi ^?' ^^^^ ^^^ ^^^^ probabilities of the random variable X)r=i ^*^i 
may be bounded as follows: 

Cn \ 2 

|^a,e,| >^ < 2cxp(-|^), ^ > 0. (7) 

At the heart of these tail bounds is the following exponential moment bound: 

n 

Eexpri^aiej] < exp(t2wV2), t G M. (8) 



From the latter bound we shall deduce the following type inequality in Section|6) 

Leiiuna 3.4. The space £^ is of type 2 with constant \j2\og{2d). 

Using this upper bound together with Proposition 13.11 yields another Nemirovski type 
inequality: 

Corollary 3.5. For (B, || • ||) = H'^, inequality is satisfied with K = /•irTypc2('^, oo) = 
81og(2d). 

Refinements. Let r2(£J^) be the optimal type 2 constant for the space t^. So far we know 
that T2{i'^) < ■\/21og(2(i). Moreover, by a modification of Example ll.2l one can show that 



T2{C) > ca := /E max Z2. (9) 

Y l<]<d ■' 

The constants Cd can be expressed or bounded in terms of the distribution function $ of 
A^(0,1), i.e. $(z) = J^^(j>{x)dx with <p{x) = cxp(-a;V2)/\/2^. Namely, with W := 
maxi<j<d \Zj\, 

cl = lE(iy2) ^ E / 2t\t<w^dt = I 2tT{W>t)dt, 
Jo Jo 

and for any t > 0, 

v(w>t) h i-jp(^<^) = i~n\zi\<tr = 1 - (2<i>(t) - 1)^ 

~ \< d'P{\Zi\>t) = 2d(l-$(i)). 

These considerations and various bounds for $ will allow us to derive explicit bounds for Cd- 
On the other hand, Hoeffding's inequality (|7]i has been refined by Pinelis ifTTl l20l as 
follows: 

Pfl^aje, >zj < 2A'(l-$(z/w)), z > 0, (10) 

where K satisfies 3.18 < K < 3.22. This will be the main ingredient for refined upper 
bounds for T2{i'^). The next lemma summarizes our findings: 



Lemma 3.6. The constants Cd and T2{i'^) satisfy the following inequalities: 

T2{£t) < ^/2\ogd+h2{d), d > 1 



y/2\ogd + hi{d) <Cd< { V 21ogd, d>3 (11) 

A/21ogd + /i3(rf), d> 1 

where h2{d) < 3, h2{d) becomes negative for d > 4.13795 x 10^", h'i{d) becomes negative 
for d > 14, and hj (d) ^ — log log d as d -^ oo for j = 1, 2, 3. 

In particular, one could replace KTypc2{d, oo) in Corollary [33] with Slogd + 4/i2(d). 

4 The empirical process approach: Truncation and 
Bernstein's inequality. 

An alternative to Hoeffding's exponential tail inequality (|7|l is a classical exponential bound 
due to Bernstein (see e.g. 121): Let Yi,Y2, . . . ,Yn be independent random variables with 
mean zero such that \Yi\ < k. Then for any w^ > X)"^! Var(yi), 

^(\tY.\>.\<2e^J- ^,/", \, .>0. (12) 



=1 



2(t;2 + Kx/3)y ' 



We will not use this inequality itself but rather an exponential moment inequality underlying 
its proof: 

Lemma 4.1. For L > define 

e{L) := exp(l/L) - 1 - 1/L. 

Let Y be a random variable with mean zero and variance a^ such that \Y\ < k. Then for 
any L > 0, 

With the latter exponential moment bound we can prove a moment inequality for random 
vectors with bounded components: 

Lemma 4.2. Suppose that Xi = {Xij)'^!'^^ satisfies \\Xi\\oo < n, and let T be an upper 
bound for iivaxi<j<d'^i^i^^T^{Xij). Then for any L > 0, 

K 

Now we return to our general random vectors Xi e M'* with mean zero and IE ||Xi||^ < 
oo. They are split into two random vectors via truncation: X^ — X^""' + X^ ' with 

Xi'' ■= l[||x>|U<K„]-'^j and X.^ := l[||x.||^>K<,]-'^i 

for some constant Kq > to be specified later. Then we write S'„ = A^ + i?„ with the 
centered random sums 

n n 

A„ := 5](xf)-]Exf)) and i3„ := ^^(xf - Exf)). 

The sum An involves centered random vectors in [— 2ko, 2ko]'' and will be treated by means 
of Lemma |4.2[ while i?„ will be bounded with elementary methods. Choosing the threshold 
K and the parameter L carefully yields the following theorem. 



Theorem 4.3. In case of {M,\\ ■ j | ) = £J^ for some d> I, inequality f^ holds with 

K = KT,Be.n{d,^) ■= (l + SA6^\ogi2d))\ 
If the random vectors Xi are symmetrically distributed around 0, one may even set 

K = i^fc^^Kcx.) = (l + 2.9VM2d))'- 

5 Comparisons. 

In this section we compare the three approaches just described for the space ^^. As to the 
random vectors Xi, we broaden our point of view and consider three different cases: 

General case: The random vectors Xi are independent with IE ||Xi||^ < c« for all i. 

Centered case: In addition, IE X^ = for all i. 

Symmetric case: In addition, Xi is symmetrically distributed around for all i. 

In view of the general case, we reformulate inequality (|4]i as follows: 

n 

IE||5„-1E5„||^ < A-^IE||X,||2^. (13) 

1=1 

One reason for this extension is that in some applications, particularly in connection with 
empirical processes, it is easier and more natural to work with uncentered summands Xi. Let 
us discuss briefly the consequences of this extension in the three frameworks: 

Nemirovski's approach: Between the centered and symmetric case there is no difference. 
If (HJi holds in the centered case for some K, then in the general case 



1E||5„-IE5„||L < /i^IE||X,-IEX,||L < 4A-^]E||X, 



i=l j=l 



The latter inequality follows from the general fact that 

iE||y-iEy|p < ]E((||y|| + ||iEyj|)2) < 2]E||rf + 2||iErf < 4iE||r|p. 

This looks rather crude at first glance, but in case of the maximum norm and high dimension 
d, the factor 4 cannot be reduced. For let y G K'' have independent components Yi , . . . , Y^ G 
{-1,1} with P(Y, = 1) = 1 - P(y, = -1) = p e [1/2,1). Then \\Y\\oo = 1, while 
lEF = (2p-l)^''^iand 

I 2p else. 

Hence 

]E||y-]Ey||g, ^ 4((i-p)V + p'(i-/)). 
EliriiL ^ p) p ^p \ p )) 

If we set p = 1 — (i^^/^ for d > 4, then the latter ratio converges to 4 as d ^ oo. 
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The approach via Rademacher type 2 inequaUties: The first part of Proposition 13.11 
involving the Rademacher type constant Tp, remains valid if we drop the assumption that 
lEXj = and replace S'„ with Sn — IE S'n. Thus there is no difference between the general 
and the centered case. In the symmetric case, however, the factor 2^ in Proposition 13. II 
becomes superfluous. Thus, if (HI holds with a certain constant K in the general and centered 
case, we may replace K with K/A in the symmetric case. 



The approach via truncation and Bernstein's inequality: Our proof for the centered case 
does not utilize that IE Xi = 0, so again there is no difference between the centered and 
general case. However, in the symmetric case, the truncated random vectors l{j|Xi||oo < 
K}Xi and l{||Xi||oo > K}Xi are centered, too, which leads to the substantially smaller 
constant K in Theorem l4.3l 

Summaries and comparisons. Table [T] summarizes the constants K = K{d, oo) we have 
found so far by the three different methods and for the three different cases. Table |2] contains 
the corresponding limits 

K{d, cxj) 



K* 



:— lim 

d — 'OO 



logd 



Interestingly, there is no global winner among the three methods. But for the centered case, 
Nemirovski's approach yields asymptotically the smallest constants. In particular. 



,. -ft^TrBern(d, oo) 

lim — — 

rf^oo ii'Nom(UiOo) 



3.46^ 
2e 



2.20205, 



lim 



■yTypc2(rf,00) 

,. .f^TrBern(d, oo) 

lim — - 

d^oo i^Typc2(«, C5o) 



- = 1.47152, 

e 



3.46^ 



1.49645. 



The conclusion at this point seems to be that Nemirovski's approach and the type 2 inequal- 
ities yield better constants than Bernstein's inequality and truncation. Figure [T] shows the 
constants K{d, cxj) for the centered case over a certain range of dimensions d. 





General case 


Centered case 


Symmetric case 


Nemirovski 


8e log d — 4e 


2e log d— e 


2e log d — e 


Type 2 inequalities 


8 log(2d) 
81ogd + 4/i2(d) 


8 log(2d) 
81ogd + 4/i2(d) 


2 log(2d) 
2logd + h2{d) 


Truncation/Bernstein 


(l + 3.46Vlog(2d))' 


(l + 3.46v/log(2d))' 


(1 + 2.9Vlog(2d))' 



Table 1: The different constants K{d, oo) 





General case 


Centered case 


Symmetric case 


Nemirovski 


Be = 21.7463 


2e = 5.4366 


2e = 5.4366 


Type 2 inequalities 


8.0 


8.0 


2.0 


Truncation/Bernstein 


3.46^ = 11.9716 


3.46^ = 11.9716 


2.9^ = 8.41 



Table 2: The different limits K" 
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Figure 1: Comparison of K{d, oo) obtained via the three proof methods: Blue (bottom) = 
Nemirovski; Magenta and Red (middle) = type 2 inequalities; Green (top) = truncation and 
Bernstein inequality 

6 Proofs. 

6.1 Proofs for Section |2] 

Proof of (|6). In case of r = cx), the asserted inequalities read 

||a;||oo < ll-^il, < rf^^'^||a;||oo for 1 < g < oo 

and are rather obvious. For 1 < q < r < oo, ^ is an easy consequence of Holder's 
inequality. D 

Proof of Lemma 12:41 In case of r = 2, V{f + g) is equal to V{f) + DV{J, g) + V{g). 
In case of ?- > 2 and ||/|jr = 0, both DV{f, g) and / h{f)g dfi are equal to zero, and the 
asserted inequalities reduce to the trivial statement that V{g) < {r — l)V{g). Thus let us 
restrict our attention to the case r > 2 and ||/||r > 0. 
Note first that the mapping 

R3t ^ ht-.^ i/+iffr 

is pointwise twice continuously differentiable with derivatives 



ht 
ht 



r\f + tgr'signif + tg)g = r\f + tgr'if + tg)g 
rir~l)\f + tgr^g\ 



By means of the inequality \x + y\'' < 2'' ^ (|x|^ + \y\^) for real numbers x, y and 5 > 1, a 
consequence of Jensen's inequality, we can conclude that for any bound to > 0, 

maxjAtl < r2^-'{\fr'\9\+t:-'\gn, 

max|/i,| < rir- 1)2^-' {\fr'\g\^+t:~'\gr). 

\t\<~to 
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The latter two envelope functions belong to ii(/^t). This follows from Holder's inequality 
which we rephrase for our purposes in the form 

Ifl'-'-^^lg]^'' d^i < \\f\\l'-^>\\g\\^/ forO<A<l. (14) 

Hence we may conclude via dominated convergence that 

t ^ m-=\\f+t9\\: 

is twice continuously differentiable with derivatives 

v'{t) - rj\f + tgr\f + tg)gdfi, 
v"{t) = r{r-l)l\f + tgr'g^dpi. 

This entails that 

t ^ v{t):=V{f + tg) = i)(f)2A 

is continuously differentiable with derivative 

v'{t) = {2/r)i{t)'^^-'i'it) - i'^'-'it) [h{f + tg)gdf,. 



For f = this entails the asserted expression for DV{f, g). Moreover, v{t) is twice contin- 
uously differentiable on the set {t e M : j|/ + tg\\r > 0} which equals either M or M \ {to} 
for some to 7^ 0. On this set the second derivative equals 



v"{t) = (2/r)S(t)2/'-iv"(<) + (2/r)(2/r - l)i{tf^'^~^i'{tf 

^l±^g^d,-2ir-2)(f\l±^^r:^^l±^gc '' 



J ll/+^5l|r 

< 2(r-l)||.g||2 = 2ir^l)V{g) 
by virtue of Holder's inequality ( fT4b with A = 2/r. Consequently, by using 

v'{t) - v'{0) = I v"{s) ds < 2{r - l)V{g)t, 
Jo 



we find that 



Vif + g)-Vif)-DVif,g) 



= v{l) - v{0) - v'{0) == [ (v'it) - z;'(0)) dt 

Jo 

< 2{r-l)V{g) f tdt=ir~l)V{g). 
Jo 



D 
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Proof of Theorem 12.21 The first part is an immediate consequence of the considerations 
preceding the theorem. It remains to prove the (in)equaHties and expansion for A'Ncm(rf, r). 
Note that KT<sem{d,r) is the infimum of h{q)d^^^'' over all real q e [2,r], where h{q) := 
{q — l)tP^'' satisfies the equation 

h'{q) ^ ^((g-logd)2-(logd-2)logd). 

Since 7 < e^ < 8, this shows that h is strictly increasing on [2, oo) if d < 7. Hence 



For d > 8, one can easily show that log d — \/(logd — 2) logd < 2, so that h is strictly 
decreasing on [2, r^] and strictly increasing on [r^, oo), where 



rd 
Thus for d > 8, 

K^f,^{d,r) 



:= logd+V(logrf-2)logd|'' Jf^j' , 

I > 21ogd — 2. 



' h{r)d-'^/'' = r-1 < 21ogd-l ifr'<rd, 

h{rd)d-'^/'' < h{2\ogd) = 2elogd-e ifr>rd. 



Moreover, one can verify numerically that K-^^icmid^ r) < d < 2e log d — e for 3 < d < 7. 
Finally, for d> 8, the inequalities r^ := 2 logd — 2 < 7'^ < r^' := 2 log d yield 

K^e,aid,oo)^h{rd) > (r;i-l)d2Ad ^ 2elogd-3e, 

and for 1 < d < 7, the inequality d = Kj^cm{d, oo) > 2e log((i) — 3e is easily verified. D 

6.2 Proofs for Section |3] 

Proof of Lemma 13.21 The following proof is standard; see e.g. H], page 160, 02], page 
247. Let .Ti , . . . , x„ be fixed functions in L^ (fj,). Then by llD, for any t £ T, 



l/r 



lE^e,x,(t) [ < Sj^|a::,(f)| 



1/2 



(15) 



To use inequality ( fTSl l for finding an upper bound for the type constant for Lr, rewrite it as 



E|^6,x.(t)'' < B;(J2\^,it)\'' 



1=1 i=l 

It follows from Fubini's theorem and the previous inequality that 



r/2 



^lE 



^i^i 



i=l 



/n 
|^e,x,(i)[d/^(t) 

i— 1 

n 

iE|^Qs,(t)rd/^(t) 

i=l 
/• / " \ '■/z 
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Using the triangle inequality (or Minkowski's inequality), we obtain 



^IIE 



^iXj 



2/r 



" sr/2 ^ 2/r 

4=1 

2/r 



< Br 

< BiY.n\xm''dm 

n 

= 5.2 Ell- 



■-illr- 



Furthermore, since g{v) = u^/'" is a concave function of u > 0, the last display implies that 



^IIE 



6jXi 



i=l 



< 



^IIE 



^i-^i 



2/r 



< 



^,^Ei 



D 



Proof of Lemma |3.4[ For 1 < i < n let x; = {xim)m=i be an arbitrary fixed vector in 
W^, and set S := X]r=i ^i^»- Further let Sm be the m-th component of 5* with variance 

'"m '■= YJi=i^lm- and define i;2 ■— maxi<„j<rft;2j, which is not greater than YI^=i II 3^*11 L- 
It suffices to show that 

lEII^IlL < 2\og{2dy. 

To this end note first that h : [0, oo) -^ [1, oo) with /i(t) := cosh(fi/2) ^ Y.'kLi) t^/{'^k)\ 
is bijective, increasing and convex. Hence its inverse function h^^ : [l,oo) -^ [0, oo) is 
increasing and concave, and one easily verifies that h^^{s) = (log(s + (s^ — 1)^/2)) < 
(log(2s))^. Thus it follows from Jensen's inequality that for arbitrary i > 0, 

Elj^ll^ = t~^TEh-\cosh{\\tS\\oo)) < t-'-h~\JEcosh{\\tS\\oo)) 
< t-2(iog(2]Ecosh(||t5|U)))'. 
Moreover, 

d 

]Ecosh(||i5||oo) = IE max cosh(<S'„i) < V ]Ecosh(t5„) < dexp{fv'^/2), 

rri=l 

according to ([8]), whence 

EII^IlL < <^'log(2dcxp(t2t;V2))' = (log(2rf)/t + toV2)'. 



Now the assertion follows if we set t — y/2\og{2d)/v'^. 



D 



Proof of (|9). We may replace the random sequence {Xi} in Example ll.2l with the random 
sequence {eiXi}, where {ci} is a Rademacher sequence independent of {Xi}. Thereafter we 
condition on {X^}, i.e. we view it as a deterministic sequence such that n~^ X]r=i -^i-^J 
converges to the identity matrix Id as n ^ oo, by the strong law of large numbers. Now 
Lindeberg's version of the multivariate Central Limit Theorem shows that 



IE 



sup 



J2i=i '^t^t 



>i ELill^^ 



2 

illoo 



> sup IE 

n>l 



-^/^E^^^' 



> 



Cd- 



a 
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Inequalities for $. The subsequent results will rely on ( fTOl i and several inequalities for 
1 — $(2). The first of these is: 

l-$(z) <z-V(z), z>0, (16) 

which is known as Mills' ratio; see ||6l and |fT9l for related results. The proof of this upper 
bound is easy: Since (f>'{z) ~ —z<j>{z) it follows that 

/•OO /"OO I -I /"OO / / \ 

i-$(z)=/ (t){t)dt< -cj){t)dt^^ 0'(i)dt = rLi. (17) 

Jz Jz Z Z Jz Z 

A very useful pair of upper and lower bounds for 1 — <l>(z) are as follows: 

^ (z) < 1 - $(z) < i^,/,(z), z>-l; (18) 



z + Vz'^ + 4 3z + Vz^ + '< 

the inequality on the left is due to Komatsu (see e.g. ifTOl p. 17), while the inequality on the 
right is an improvement of an earlier result of Komatsu due to 



Proof of Lemma |3.6[ To prove the upper bound for T2(^J^), let {ei)i>i be a Rademacher 
sequence. With S and Sm as in the proof of Lemma [34l we may write 



EIIS'IJ^ = / 2tp( sup \Sm\>t]dt 

Jo ^ l<m<d ' 



< 5^ + 2tW[ sup \S^\ >t]dt 

J S ^ l<m<d 



< 5^ + 



y / 2t]P{\Srn\>t)dt. 
,n=l Ji 

Now by ( fTOl ) with v^ and w^ as in the proof of Lemma [3741 followed by Mills' ratio ( fT6] l. 

/ 2tV{\S,n\>t)dt < [ 

Js Js 



^^""-ie-*^/(2"")di 



2Trt 
AKv r°° 2,, 2. /-"o p-*V(2w^) 



\/27r Js ™ Js \/27rw„ 

= AKvlXl-^{5/v,n)) < ^Kv''{l-^{5/v)). (19) 

Now instead of the Mills' ratio bound ( fTSI l for the tail of the normal distribution, we use the 
upper bound part of ( fTSl l due to |.23il . This yields 

r 2tn\Sra\ >t)dt< AKv^{l ^{5/v)) < — ^7' ^ e-'Vi2^-')^ 

Js 3S/v + ^S^jv^ + 8 

where we have defined c := 4/'ir/\/27r = 12.88/A/27r, and hence 

36/v + V<52/w2 + 8 

Taking 

crf/2 \ 



(5^ = u^21og 



^v/21ogM/2)^ 
16 



gives 



TE\\Sf < i;2J21ogd + 21og(c/2)-log(21og(dc/2)) 



V21ogM/2) 



3, /2 log I 



2^21og(cd/2) 



2 log I 



2^21og(cd/2)^ 



=: v^2\ogd + h2{d)} 

where it is easily checked that h2{d) < 3 for all d > 1. Moreover h2{d) is negative for 
d > 4.13795 * 10^*^. This completes the proof of the upper bound in ( fTTT l. 

To prove the lower bound for Cd in (fTTl i. we use the lower bound of ifTSl . Lemma 6.9, 
page 157 (which is, in this form, due to Jl]). This yields 



ci> 



A 



-t + 



1 



1 + A ° 1 + A 



4t(l-$(i))dt 



(20) 



for any to > 0, where A — 2d{l — <l>(to))- By using Komatsu's lower bound ( fTSl ). we find 
that 



/■°° 2^ 

t{l-^t))dt > / ; 



> 



2io 



io + V^r+4 Jt 
2 






i + ym/tf 



Using this lower bound in ( l20l i yields 



c3 > 



A 



1 + A ° 1 + A i + ^TTIM 
2d(l -$(io)) 



(1 - $(io)). 



(i-'5(0) 



i^ + 



1 + 2d(l - <l>(t„)) 1 ° 1 + ^1 + 4/tl 



id 



> 



ta + \/t?,+4: 



Hto) 



1 + 



4d 



Now we let c = ^/ijn and (5 > and choose 

io = 21og 



</>(to) r 1 + v/TTlTIf 



cd 



,(21og(cd))(i+'5)/2 
For this choice we see that io ^ oo as d — > oo, 

2d (21og(cd))(i+*)/2 



and 



4d(/.(io) = - 
4d(/)(to) 



/2^ 



cd 



= 2(21og(cd)) 



(l+5)/2 



2(21og(cd))(i+'')/2 



io 



{21og(cd/(21og(cd))(i+*)/2)}i/2 



(21) 
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as (i ^ oo, so the first term on the RHS of ( |2TI ) converges to 1 as d ^ oo, and it can be 
rewritten as 



= ^d<21og 



i + ^T+IMj 

cd 



,(21og(cd))(i+^)/V 1 + v/1 + 4/t2 j 

- l-{21ogd + 21ogc-(l + J)log(21og(cd)) + 2}. 

To prove the upper bounds for Cd, we will use the upper bound of ifTsl . Lemma 6.9, page 
157 (which is, in this form, due to [5|). For every to > 

/•oo 

cl = lE max jZ^p < tl + d 2tP{\Zi\ > t)dt 
^<J<d Jt„ 

/>oo 

= tl + Ad t{l - <i>{t))dt 

Jta 

/CXD 
(p{t)dt (by Mills' ratio) 

= tl + Mil-l'ito)). 



Evaluating this bound at to = \ 2\og{d / \/2ti) and then using Mills' ratio again yields 



cl < 2 \og{d/V2^) + 4d(l - $(y^21og(d/\/2^))) 

< 21ogd-2-log(2^)+4d— ^ 

^21og(d/^/2^) 

= 21ogd-log(27r)+ ^^ = (22) 

^log(d/V2^) 



< 21ogd 
where the last inequality holds if 

2\/2 



< log(2^), 



or equivalently if 

8 log(27r) 

logd > n .0 \\2 + ^^ = 3.28735..., 
(log(27r))2 2 

and hence if d > 27 > e-^^®^^^-- = 26.77. The claimed inequality is easily verified numeri- 
cally for d = 3, . . . , 26. (It fails for d — 2.) As can be seen from ( |22] ). 2 log d — log(27r) gives 
a reasonable approximation to ]Emaxi<j<rf Z^ for large d. Using the upper bound in ( fTSl l 

instead of the second application of Mills' ratio and choosing i^ = 2 \og{cd/ ■\/2 log(cd) ) 
withe := a/2 /tt yields the third bound for c,i in (fTTl i with 

/i3(d) = -logW-log(logM)) 

8 



g /j^ log(21og(cd)) /j^ log(2 1og(cd)) 



21og(cd) ' y 21og(cd) "^ log(c(i) 
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6.3 Proofs for Section m 

Proof of Lemma I4.H It follows from IE Z = 0, the Taylor expansion of the exponential 
function and the inequaUty IE |Z|™ < ct^k"^~^ for m > 2 that 

z \ „r i' z \ z 



^^^pb)^i+M^^pb)-i-^} 



1 1E|Z|'" _, fj^ ^ 1 1 a^e{L) 



!— !_ < IH > 



< 1+ > -^ , ' ,' < 1 + ^ > ^— - = 1 



-^ m! (kL)"' ~ k2 ^-^ m! L" 

m=2 ^ ' m=2 D 

Proof of Lemma |4l2l Applying Lemma |4n to the j-th components Xij of Xi and Snj of 
S",! yields for all L > 0, 



Hence 



--(^) ^ n--(^) ^ iiM'-^'^i^) <- -(^)^ 



Ecoshf^^^) = IE max coshf^) < V Ecoshf^) < dexpf^^V 

\ kL J l<j<d \ K.L J ~ '^^ \ kL J ~ \ K^ J 

i=i 
As in the proof of Lemma [T4l we conclude that 

TEWSJl < (.i)2(^log(2]Ecosh(^|^)))' 

< (.Lf(log(2d) + ^)' 
- (.Llog(2d) + ^)^ 
which is equivalent to the inequality stated in the lemma. D 

Proof of Theorem 14.31 For fixed Kq > we split S'„ into An + Bn as described before. 
Let us bound the sum i?„ first: For this term we have 

n 
||S„||oo < Y.{h\X.\U>-o]U^\\oo+nh\XA\^>.^]\\X^\\oo)] 

n 

= 2Ji-^[ii-^'ii°=>«°ill^*ll°° ^]E(i[||Xi||„>„„]||Xi||ocO} 

n 
+ 2^1E(l[||;f.||,^>,„]|lX,||oo) 

= : Bnl + Bn2- 
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Therefore, since IE _B„i = 0, 

n n 

= EVar(l[l|x,IU>«„]ll^dloo) +4(^]E(|lX,|Ul[||x.|u>K„l) 



i=l 4=1 

j=i i=i ^° 

= r + 4— , 

where we define T := ^"^^ IE |1X,||^. 

The first sum, A„, may be bounded by means of Lemma l4~2l with k = 2ko, utiUzing the 
bound 



9 1 , . rLe(L)\2 



Thus 

TEWAnW't, < {2noL\og{2d) 

Combining the bounds we find that 



< 2koL iog(2d) + n^iLl + vr + 2— 

ZKq Kq 

3 
= cxKo H V VT, 

Kq 



where a := 2L \og{2d) and p := T{L e{L) + 4)/2. This bound is minimized if Kq = y/P/ct 
with minimum value 



2v/^+Vr = (1 + 2y/L^c{L) + 4L^/log{2d))VT, 

and for L ~ 0.407 the latter bound is not greater than 

(l + 3.46v/log(2d))Vf. 

In the special case of symmetrically distributed random vectors Xi, our treatment of the 
sum Bn does not change, but in the bound for IE ||yl„||^ one may replace 2ko with Kq, 
because IE X,^"' = 0. Thus 

v/lE||5„||^ < KoLlog(2d) + ^-^^+ Vf +2— 

Kq Kq 

B' 
= a'KQ + — + Vf (witha :=Llog(2d),/?' :=r(Lc(i) + 2)) 



= (1 + 2^L^c{L) + 2L^\og{2d))Vf {xf Kq = ^ p' / a') . 
For L = 0.5 the latter bound is not greater than 



(l + 2.9v/log(2d))\/f. D 
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